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Abstract — The problem of joint source-channel coding is 
considered for a stationary remote (noisy) Gaussian source and 
a Gaussian channel. The encoder and decoder are assumed 
to be causal and their combined operations are subject to a 
delay constraint. It is shown that, under the mean-square error 
distortion metric, an optimal encoder-decoder pair from the 
linear and time-invariant (LTI) class can be found by mini- 
mization of a convex functional and a spectral factorization. 
The functional to be minimized is the sum of the well-known 
cost in a corresponding Wiener filter problem and a new term, 
which is induced by the channel noise and whose coefficient is 
the inverse of the channel's signal-to-noise ratio. This result 
is shown to also hold in the case of vector-valued signals, 
assuming parallel additive white Gaussian noise channels. It is 
also shown that optimal LTI encoders and decoders generally 
require infinite memory, which implies that approximations are 
necessary. A numerical example is provided, which compares 
the performance to the lower bound provided by rate-distortion 
theory. 

Index Terms — Analog transmission, causal coding, delay 
constraint, joint source-channel coding, MSE distortion, remote 
source, signal-to-noise ratio (SNR). 



I. Introduction 

THE design of systems for point-to-point commu- 
nication of analog data over noisy communication 
channels has a theoretical basis in Shannon's separation 
theorem. The theorem gives a bound on the optimal 
performance theoretically achievable (OPTA) by any com- 
munication system. Specifically, it says that the distortion 
can not be made smaller than I?min, which can be obtained 
from 
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Fig. 1. The encoder measures the source sequence s plus the 
measurement noise m and transmits t over the channel. The decoder 
receives t plus the channel noise n and forms s, the estimate of s. 
Each source element has to be estimated after a given delay in order 
to minimize the error e. 



where TZ{'D) is the rate-distortion function, which is given 
by the source statistics and the distortion measure, and C 
is the channel capacity. Under appropriate assumptions, 
the separation theorem also shows that it is possible to 
come arbitrarily close to Pmin by the combination of 
source coding and channel coding. These codes can, in 
principle, be independently developed without loss. This 
means that the channel code designer does not need to 
know anything about the source, and vice-versa, which is 
clearly a practical advantage. 

The separation theorem does, however, rely on asymp- 
totic arguments where the delay and the size of the code- 
book are allowed to increase indefinitely. Consequently, 
it does not hold in presence of delay or complexity 
constraints and imposing such constraints generally ren- 
ders the distortion bound unachievable. Since infinitely 
large delays or codebooks are not possible in practice, 
a suboptimal performance may have to be accepted. 
Moreover, to minimize the distortion in the presence of 
these constraints, it may be necessary to abandon the 
separation-based design and consider joint source-channel 
codes. 

This is the subject of the present paper, where we 
consider transmission of a stationary colored Gaussian 
source over a power-constrained channel with additive 
colored Gaussian noise, under the mean-square error 
(MSE) distortion criterion. The encoder and decoder are 
constrained to be causal and their combined operations 
are subject to a delay constraint. Further, we allow for 
the possibility of a remote (noisy) source. The situation 
is illustrated in Fig. [TJ 

The encoder and decoder will be restricted to the class 
of linear and time-invariant (LTI) filters. The linearity 
assumption and the additive noise models allow us to for- 
mulate the distortion minimization as a transfer function 
optimization problem. The main result is that a jointly 



optimal encoder- decoder pair from the LTI class can be 
found by first minimizing a functional of the form 
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where R e £00 and N £ "Hoo a-i'e given transfer functions 
and (7^ is the signal-to-noise ratio (SNR), over X e T-L2- 
The encoder and decoder are then obtained from a spectral 
factorization. A corresponding result is also shown to hold 
in the case with vector- valued signals and parallel additive 
white Gaussian noise (AWGN) channels. 

The restriction to linear encoders and decoders may 
obviously result in suboptimal solutions. Nevertheless, the 
linear solution to any problem instance will provide an 
upper bound to the minimum distortion possible for the 
given SNR, delay constraint, and signal spectra. Moreover, 
the proposed design methods are relatively simple and 
computationally feasible. 

An application where this problem formulation could be 
relevant is the transmission of speech in mobile communi- 
cation. The source signal to be estimated at the receiver 
is the speech signal. The delay constraint is based on the 
acceptable latency and the noise is any background sound 
present at the microphone. 

The rest of this section will present the relevant previous 
research and alternative interpretations of the problem. 
Section In] presents the mathematical notation used in this 
paper. The exact problem formulation is given in Section 
nil Section IIVI is devoted to the solution of the problem, 
first in the scalar and then in the vector case, followed by 
a theorem stating that optimal LTI encoders and decoders 
require infinite memory. Section |V] presents a procedure 
for numerical solution and a numerical example where the 
performance of the optimal LTI encoders and decoders is 
compared to the lower bound provided by the separation 
theorem. Finally, Section FVll presents the conclusions and 
discusses further research. Some technical lemmas have 
been put in the appendix. 



A. Previous Research 

The problem studied in this paper is closely related 
to that of finding the optimal modulation matrices for 
linear coding and decoding of a Gaussian vector source 
for transmission over a Gaussian vector channel. Optimal 
modulation matrices were derived in [2], where it was 
also shown that linear modulation is only optimal when 
the source and channel can be matched. That is, when 
their dimensions match and the source and channel noise 
covariance matrices can be diagonalized into uniform 
variances. The same problem was considered in [3], where 
the solution was also given for the case when the channel 
components have individual power constraints. The per- 
formance of optimal linear coding was compared, for a 



number of cases, to the OPTA, given by ([ij, in [4]|^ 

The general suboptimality of linear coding arises from 
the fact that it cannot match any colored Gaussian source 
to any colored Gaussian channel. It has recently been 
shown, however, that such matching can be achieved by 
the combination of prediction and modulo-lattice opera- 
tions [5j. 

The problem of coding with a remote source was first 
considered for the Gaussian case with additive noise and 
MSE distortion in [6]. It was shown that the problem is 
asymptotically equivalent to, and can thus be reduced to, 
the fully observed case and that an optimal encoder gen- 
erally has a structure consisting of an optimal estimator 
followed by optimal encoding for a noise-free source. This 
structural result was generalized to the non-gaussian and 
finite time horizon cases in [7]. The problem was further 
studied in J8j, where it was noted that in the case of 
white source noise, the criterion in the reduced problem 
is given by the conditional expectation of the original 
criterion given the encoder input. It was pointed out in 
[9J that the equivalence in [6j actually was proved for the 
one-shot problem as well. Moreover, it was shown that 
the reduction to the non-remote problem follows from a 
general "disconnection principle". In the literature, the 
problem of coding with a remote source often includes 
the possibility of noise at the receiver as well. The main 
motivation for excluding that possibility here is the fact, 
noted in [7], that the optimality of an encoder-decoder 
design is independent of additive and independent zero- 
mean noise at the receiver. 

Coding problems with delay constraints have not re- 
ceived the same level of attention as their classical 
counterparts. Some structural results have, however, been 
obtained. The optimal causal source coder for a white 
source has been found to be memoryless [1^. For a Markov 
source of order k and delay constraint d, an optimal real- 
time source coder only needs to use the last max{/c, d-l- 1} 
source symbols plus the current state of the decoder. No 
such memory bound is given, however, when the encoder 
does not have access to the decoder state [11]. Joint source- 
channel coding with noiseless feedback was considered for 
finite alphabet sources in |12j where it was demonstrated 
that feedback is useful in general, but that coding is useless 
for a class of channels with a certain symmetry property. 
The results in [TT] , [T^] have been generalized in [T3] , which 
also gives a nice overview of the literature on real-time 
coding. Conditions have also been found for when optimal 
performance can be achieved without coding (even when 
allowing coding systems with arbitrary delay) |14| . 

Since the OPTA given by ([T]) cannot generally be 
achieved in the presence of delay constraints, a relevant 
question to ask is of course what the OPTA is when there 

^In all of these three papers, one may view the source vectors 
as vectors in a one-shot problem, where there is no dependence over 
time, or as finite sequences. In the former interpretation, the solution 
satisfies a zero-delay constraint, but this is not very interesting due to 
the lack of dependence. In the latter interpretation, a delay constraint 
would translate to requiring the matrices to be lower-triangular, 
which is not done. 



are such constraints. A partial answer in the form of 
upper bounds on the rate-distortion functions for zero- 
delay and causal source coding is given in the recent 
paper [15] . Interestingly, some of the results in that paper 
are obtained by solving a problem which is somewhat 
similar to the one considered in this paper. The solution 
of that problem can be applied to solve some particular 
instances of the problem considered in this paper. The 
main difference is that they assume that the encoder 
has access to noiseless feedback from the channel output. 
Moreover, only the scalar case with zero delay constraint 
and no noise at the source is considered. The same problem 
has previously been considered in |16) . |17| as a means to 
design optimal scalar feedback quantization schemes. 

Real-time source coding for a remote source has been 
considered in [18J. The structural results of [11], [12] 
were extended to cover remote sources in [T9], which 
also presented a separation result for the linear-quadratic 
Gaussian case similar to the one in \6\ . A method for design 
of optimal real-time coding systems for noisy channels 
was presented in [20j using noisy feedback and in [21| 
without feedback. However, there seems to be no method 
for efhcient numerical application of the solution. 

B. Alternative Interpretations 

It is possible to make two alternative interpretations of 
the problem illustrated in Fig. [TJ 

1) Connection to Wiener Filter: The problem of es- 
timating a signal that is measured with additive noise 
under an MSE criterion is solved by the Wiener filter 
[22| . The filter is usually obtained by solving the Wiener- 
Hopf equations, but can also be expressed in the frequency 
domain as the stable filter K that minimizes 
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where d is the allowed time delay and S and M are transfer 
functions that represent the frequency characteristics of 
the signal of interest and the measurement noise, respec- 
tively. 

It is possible to interpret the problem in Fig. [Tj as a 
distributed Wiener filtering problem, where the filter is 
separated into two different locations. The communication 
channel is used to model the communication constraint 
between the two locations. This interpretation is strength- 
ened by the fact that minimization of ([3]) is equivalent to 
minimizing 



\R-X\ 



(4) 



where R is the same transfer function as in ([2]), over 
X E H2- Comparing ^ with ^ it is seen that the cost 
in the present problem is equal to the cost in a Wiener 
filtering problem plus an additional term, which is induced 
by the communication channel. Since the coefficient of the 
new term is the inverse of the channel's SNR, the cost 
is asymptotically equal to that in the Wiener filtering 
problem when the SNR tends to infinity. 



2) As a Feed- Forward Control Problem: Fig. [Tj may be 
interpreted as follows: The source signal is a disturbance 
that will affect some system where a controller (the de- 
coder) can compensate. The controller has a remote sensor 
that measures the disturbance and transmits information 
to the controller over the channel. In this interpretation 
the delay block may also include any dynamics that 
the disturbance passes through on the way. A similar 
interpretation was discussed in [9J. 

A similar problem setup was studied in |23j . where 
information theory was used to find a lower bound on 
the reduction of entropy rate made possible by side 
information communicated through a general channel with 
known capacity. Under stationarity assumptions, this was 
used to derive a lower bound, which is a generalization of 
Bode's integral equation, on a sensitivity-like function. 

II. Notation 

The techniques in this paper rely on concepts from 
functional analysis, such as Cp (Lebesgue), Hp (Hardy) 
and J\f^ (Smirnov) function classes and inner-outer fac- 
torizations. To conserve space, only some of the most 
important facts will be given here. The interested reader 
is referred either to [IJ or to [23], [5S] and [5S] for the 
remaining relevant definitions and theorems. 

The natural logarithm is denoted log. The complex unit 
circle is denoted by T. The singular value decomposition 
of A is taken as A = UY^V* , where S is square. A singular 
value decomposition of a transfer matrix X E Cp is defined 
pointwise on T as 

where U,V E Coo and E e £p. 

For matrix-valued functions X{z)^Y{z) defined on T, 
define 



and the norms 
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where \\-\\p is the Frobenius norm. 

When a function in "Hp is evaluated on T, it is to 
be understood as the radial limit \i'aij.^i+ X{rz). The 
arguments of transfer matrices will often be omitted when 
they are clear from the context. Equalities and inequalities 
involving functions evaluated on T are to be interpreted 
as holding almost everywhere on T. 

HI. Problem Formulation 

Consider the system in Fig. [TJ The source s. source 
noise m and channel noise n are assumed to be mutually 




Fig. 2. A representation of the problem in the frequency domain. 
The transfer functions S, M and N are spectral factors of the 
source, measurement noise and channel noise, respectively. The delay 
constraint is determined by P. The encoder and decoder filters are 
given by C and D. PV is an optional frequency weight. 



independent, stationary Gaussiaio sequences with zero 
mean and known covariance functions. The communica- 
tion channel has additive noise and a power constraint. 
That is, 



r ~ t + n 
E(i(fc)2) <a^. 



(5) 
(6) 



Denote the encoder mapping by 7(-) and the decoder 
mapping by S{-). The encoder and decoder are assumed to 
be causal LTI filters with inputs s + m and r, respectively. 
The estimate of the source sequence is 



d{t + n) = (5(7(s + m) + n). 



(7) 



Denoting the delay, in number of samples, by d, the 
reconstruction error is 



e(fc) ^s{k-d)-s{k). 



(8) 



The objective is to choose the encoder and decoder to 
minimize the stationary value of the MSE, or E(e(fc)^), 
subject to the power constraint. 

Due to the linearity assumption, the problem can be 
formulated in the frequency domain, as is illustrated in 
Fig. [21 In this formulation, all the inputs are mutually 
independent, zero mean, white noise sequences with unit 
variance. The transfer functions S{z),M{z) and N{z) are 
spectral factors of the sequences s, m and n, respectively. 
The encoder and the decoder are represented by the 
transfer functions C{z) and D{z). In this formulation, the 
problem has been generalized in two aspects: 

> The delay is replaced by a general LTI filter P. That 

is, the objective is to estimate the source signal after 

it has passed through P. 
• The error e is passed through a LTI filter W, 

representing a frequency weighting function, before 

minimization. 

It is assumed that S, M, N,P,W eTioo, that iV, W are 
invertible in Hoc and that 



3e > such that SS* + MM* > e on T, 



(9) 



^Since only linear solutions are considered, it does not matter if 
the source, measurement noise or the channel noise are Gaussian 
or not. Linear solutions may, of course, be more or less suboptimal 
depending on the distributions. 



which implies that S and M have no common zeros on 
the unit circle (an equivalent condition if S{z) and M[z) 
are rational functions). 

The objective is to choose C and D to minimize the 
stationary variance of e after filtering by W . By expressing 
the z-transform of e in terms of the transfer functions in 
Fig. [21 this quantity can be expressed as 



J{C,D)^\\W{P-DC)S\\ 



\WDCM\\ 



\WDN\\l. 
(10) 



Similarly, the power constraint on t can be written as 



|C5|| 



\CM\\l <ct2. 



(11) 



It follows from ^ and ^ that C and D need to be 
square integrable on the unit circle in order for J{C, D) 
to be finite and the power constraint to be satisfied. Since 
the encoder and decoder also should be causal and stable 
this implies that the optimization should be performed 
over C,D eH2. 

IV. Optimal Linear Encoder and Decoder 

The problem of finding an optimal linear encoder 
and decoder will first be solved in the scalar case. The 
solution will then, under some additional assumptions, be 
generalized to the vector case. 

A. Scalar case 

The objective function J{C, D) is clearly not convex in 
the pair (C, D) due to the appearance of the product DC. 
In order to find a minimum, the optimization problem will 
be solved in two steps. 

The idea is to first consider the product DC as given and 
then to find an optimal factorization of this product. The 
factorization gives an analytical expression for the cost in 
terms of the product, which means that optimization of the 
objective may then be performed over the product. When 
an optimal product is found, the optimality conditions 
from the solution to the factorization problem can then 
be applied to find optimal C and D. 

First, however, it will be shown that the power con- 
straint pip can be equivalently written as 



\CH\\l <a^, 



(12) 



where the function H has some nice properties. 

Lemma 1: Suppose that S, M G Tioo and that (|9|) holds. 
Then there exists H e Hoo with H~^ g Hoo such that 



HH* = SS* + MM* on T. 



(13) 



Proof: By ([9]) and the factorization theorem in [27] 
there exists an outer function H G H2 such that (jTSj) holds. 
Since S,M G T-Loo it follows that H £ 'Hoo- Moreover, it 
follows from (© that ||iJ~^|| < 1/^/e and since H is 

' — ^ W Moo — 'V 

outer it then follows from Lemma [H (in the appendix) 

that TJ-i e -Hoo- ■ 

Now, introduce K = DC E Hi- The objective (jTU]) can 
then be written as 



\W{P - K)S\\l + \\WKM\\l + |1T^L>A^||: 



(14) 



Note that the first two terms are constant for fixed K. 
The minimum over C and Z), given K, is thus obtained 
by minimizing the third term in (J14p subject to p2p and 
K = DC. This minimization problem is called the optimal 
factorization problem. 

The interpretation is that for any given product of the 
encoder and decoder, the contribution to the objective 
of the signals that pass through both the encoder and 
the decoder is not affected by the choice of the factors C 
and D — only their product matters. The channel noise, 
however, only passes through the decoder, which means 
that D (and implicitly C since C = D~^K) should be 
chosen to minimize the impact of the channel noise on 
the objective. The solution to the scalar version of the 
optimal factorization problem is given by the following 
lemma. 

Lemma 2 (Optimal factorization, scalar case): 
Suppose that a > 0, K e Hi and that H,N,W G "Hoc 
are invertible in 'Hoc- Then the optimization problem 

|2 



minimize ||M^Z?A^|j 

C,DeH2 



subject to 



K = DC, 



\CH\\l<a' 



attains the minimum value 
1 



— \\WKHN\ 



|2 
1 ■ 



(15) 



(16) 



(17) 



Moreover, if K is not identically zero then C,D <E 'H2 
are optimal if and only if DC = K and 

„2 



\C\' 



\WKHN\\ 



WKN 



H 



on T. 



(18) 



UK = 0, then the minimum is achieved by Z? = and 
any function C £ H2 that satisfies ||CiJ||2 < cr^- 

Proof: If K = the proof is trivial, so assume that K 
is not identically zero. Then C is not identically zero and 
D = KC"^. Then ([T5)) and Cauchy-Schwarz's inequality 
gives 



\WDNf^ = \\WKC-'^N\\ > 



\CH\\ 



IWKC^^N 



> 4t (\CH\ , \WKC-'^N\y = 4t \\WKHN\\'^, 

This shows that ([T7)) is a lower bound on the value. 
Equality holds if and only if \WKC-^N\ and \CH\ are 
proportional on T and ||CiZ||2 — cP' . It is easily verified 
that this is equivalent to p^ . Thus, C and D achieve the 
lower bound if and only if I? = KC~^ and ([T8)) holds. 

It remains to show existence of such C,D £ H2- Note 
that WKNH~^ G Hi is not identically zero. Hence, by 
Theorem 17.17 in gS], log\WKNH-^\ e Ci. It follows 
from the factorization theorem in [27] that there exists an 
outer C €'H2 that satisfies ([TS|. Thus 



\KC-^\l = \\\WKHN\\^ 
I iiz jj/ 



W-^KHN-^\\^ <oo, 



so D = KC-^ e €2- Since X e Hi and C € "Ha is 
outer it follows from Lemma 3] (in the appendix) that 

D = A'C-i e H2. ■ 



Remark 1: Optimal D satisfy 



\D- 



\WKHN\\^ 



KH 



WN 



on T. 



(19) 



Apparently, the magnitudes of C and D are both pro- 
portional to the square root of the magnitude of K. 
This provides some intuition to why the minimum value 
depends on the 1-norm of K . 

Remark 2: The existence part of Lemma [2] shows that a 
particular solution, where C is outer, can be obtained. By 
using the freedom available in spectral factorization, it is 
possible to obtain other solutions, for example by changing 
the sign of both C and D, or by instead choosing D to 
be outer. More generally, in the rational case, any non- 
minimum phase zeros or time delays could be located in 
C or D. 

For any given K an optimal encoder-decoder pair, under 
the constraint that their product is K, is specified by 
(HHI and p^ . respectively. An optimal K can in turn 
be obtained by inserting the minimum value of ||VF_DA^||2 
into p4p and minimizing 



^{K)^\\W{P~K)S\\l + \\WKM\\l 
+ X^\\WK[S M]N\\] 

„jL M L J Ml 



over K. This is a convex problem. That this procedure 
in fact solves the main problem is shown by the following 
theorem, which is the main result of this paper. 

Theorem 1: Suppose that M, iV, 5, P, W" € "Hoc, where 
N and W are invertible in "Hoc, that cr > and that © 
holds. Then the optimization problem 



minimize J{C. D) 
c,z>e«2 



subject to 



\CS\\ 



\CM\\l<a^ 



(20) 



(21) 



attains a minimum value that is equal to the minimum of 
the convex optimization problem 



minimize (j9(i^), 
Ken2 



(22) 



which is attained by a unique minimizer. 

Moreover, suppose K E H2 is a solution to p2|) . If K 
is not identically zero, then C and D solve (f20| subject 
to dUl) if and only ff C £ H2, £> == KC-^ £ H2 and 



\C\' 



\WKN\ 



WKN S M 



\^^AW 



M 



on T. (23) 



If K = 0, then the solution to ([20]) and ([21]) is given by 
D ~ and any function C E H2 that satisfies ([21]) . 

Proof: Define H E Hoo according to Lemma [1] Then 
([21]) is equivalent to ||CiJ||2 < <J^- Define the sets 

6 = [iC,D) : C,De H2, \\CH\\l < a^j 
QiK) = {{C, D) : (C, D)eQ, K = DC) . 



Then the infimum of J{C. D) subject to ([?T|) can be 
written 

inf JiC.D) 
c,Dee 



inf inf J(C, D) 
KeUi c.,Dee(K) 



inf 



W{P-K)S\\f^+\\WKM\\ 



= inf \\W{P-K)S\\1; 

= inf u}{K) 
KeUi 



\WKM\\l 



+ inf \\WDN\ 

C,De&(K) 

+ \\\WKHN\\l 



(24) 



The first equahty is true by Theorem 17.10 in |25| . 
The second equahty follows because the first two terms 
in 'vcAc,DeB(K) J{C, D) are constant. The third equality 
follows from application of Lemma [5] to perform the inner 
minimization. The final equality follows from p3p . 

It will now be shown that the minimum is attained in 
([M)l by a unique K € 'H2- Completion of squares gives 
that 

V{K) = \\W{P - K)S\\l + \\WKM\\l + 4 \\WKHN\\\ 



--\\WPS\\i + \\WKH\\i 
2 Re{WPSS\WKHH-^) 



— \\WKHN\\{ 



= \\WPSS*H-* - WKH\\:^ + — \\WKHN\\, + ?;, 
II 11^ 1^^ 

where 77 is a constant that does not depend on K. Let 
X = WKH and R = WPSS*H-* G Coo- Minimizing 
^p{K) over K € "Hi is then equivalent to minimizing 

^{X) = \\R-X\\l + ^\\XN\\l (25) 

over X E Hi. In the latter problem, it is sufficient to 
consider X such that ijj{X) < ip{0) = ||-R||2- That is, only 
X satisfying 



\X\\^ = \\R-X- 



<yAKX) 



Rh<\\R-X\\^ + \ 



\R\\. 



Now, in the weak topology, i^iX) is lower semicontinuous 
on £2 and the set {X : |jX||2 < r} is compact. This proves 
the existence of a minimum. The minimum is unique since 
ip(X) is strictly convex. Moreover, since ||X||2 < r, it is 
sufficient to minimize over X G ?^2 instead of "Hi. 

Suppose now that X e ■H2 minimizes ip{X). From 
H-\W-^ € -Hoo it follows that K = W-^XH-^ G U2 
attains the infimum value in (|24|) and that this value 
is equal to the minimum of p2[) . Since the minimum is 
attained in (|24|l and, by Lemma[2l there exists (C, D) E Q 
such that J{C,D) = (p{K), it follows that the minimum 
of (P0)l subject to (pTjl is attained. 

The optimality condition ([23|) follows from the applica- 
tion of Lemma El using that \H\ = y/\S\'^ + |M|2. ■ 

Remark 3: (p{K) is convex, and ip{K) = (p{K). Thus, 



Since the optimal K is unique, this shows that the 
minimizing K satisfies K{e^^'^) — K{e^'^). Thus, C can 
be chosen to have this property as well, meaning that 
C can be approximated by a rational function with real 
coefficients. The same holds for D. 

Remark 4: It was noted in Remark [5] that the optimal 
factorization problem can have multiple solutions. To 
clarify, the optimal K is unique but there are multiple 
factorizations of K into C and D that achieve the 
minimum value of J{C,D). 

It is noted that the solution of the problem essentially 
amounts to minimizing the sum of a 2-norm and a 1-norm 
of the decision variable. The 2-norm represents the cost in 
the Wiener filter problem, and the 1-norm represents the 
contribution of the channel noise to the error variance. 
The SNR a^ determines the relative importance of the 
two terms. For small SNR, the optimal K will have 
small magnitude since the channel noise dominates the 
transmitted signal. As the SNR becomes larger, the 
magnitude of K will become larger, and it will approach 
the Wiener filter in the limit when the SNR goes to infinity. 

B. Vector case 

In this section, the results in the previous section will 
be generalized to the case of vector-valued signals. 

Consider again the system in Figure [2] and assume that 
all signals are vector-valued and all systems are given 
by their corresponding transfer matrices. The number of 
elements in signal s is denoted n^ and so forth. That is, 
s(fc) G R"= Matrix dimensions are not explicitly stated in 
this section except when necessary. It is generally assumed 
that all matrices are of appropriate size. In addition to all 
the assumptions made in the scalar case, it is now also 
assumed that: 

1) The communication channel consists of rit parallel 
AWGN channels. The power constraint Q is re- 
placed by the total power constraint 

E{t{kft{k)) < a^. 

2) All input signals in Fig. [2] have identity covariance 
matrices. Moreover, N{z) = W{z) — I. That is, the 
channel noise is white with identity covariance and 
the frequency weight is uniform. 

3) The number of elements in the signals satisfy 



nt > min{ns,"-e}, 



(26) 



where C is n^ x Ug and Z? is rig x nt. If the number 
of channels nt would be smaller than n/ and n^, 
then the product DC could not have full rank. This 
means that optimization over K = DC would have 
to include a rank constraint, which is very difficult 
to handle even in the static case. 
4) The inequality ^ is replaced by the matrix version 

3e > such that FF* + GG* >z el on T. (27) 

The objective is thus to minimize 

JyiG,D) = ||(P - DC)S\\l + \\DCM\\l + \\D\\l 



subject to 



\CS\\ 



\CM\\l < a^ 



(28) 



The objective and the constraint are thus quite similar 
to the ones in the scalar case. It will be seen that the 
equivalent convex problem looks the same but that the 
optimality condition will, however, be more complicated. 
The main difference between the scalar and vector versions 
of the problem is that the optimal factorization (Lemma 
[5]) is much more difficult to prove in the vector case. 

Lemma 3 (Optimal factorization, vector case): 
Suppose that (t > 0, K E Hi, that H E Hoo is invertible 
in "Hoc and that (j26p holds. Then the optimization 
problem 



minimize III? 1 1 

C,DeH2 



subject to 



K ^DC, 



\CH\\l <(j^ 



attains the minimum value -^ \\KH\\-.. 

(7- II 111 

Moreover, suppose that K is not identically zero and 
let K = KiKo be an inner-outer factorization and KgH = 
Uo^V* be a singular value decomposition. Then C,D G 
7^2 are optimal if and only if 



K = DC, 



\CHr,=a' 



DD* 



\KH\ 



^lUUo^U^K* 



(29) 
(30) 



li K = then the minimum is achieved by 13 = and 
any function C G H2 that satisfies ||Ci?|J2 < cr^- 

Proof: li K = the proof is trivial, so assume that 
K is not identically zero. Then neither C nor D are 
identically zero and a = \\CH\\^ > 0. Now, suppose that 
C, D are feasible and that a < a. Then 

a (7 

are feasible and ||-Dq||2 < ll-C||2- Hence, a necessary 
condition for optimality is that ||CiJ|J2 — cP' ■ 

The remainder of this proof is divided into three parts. 
First, the dual problem is considered. Then, it is shown 
that there is a saddle point and the optimality criteria 
are derived. Finally, existence of the solution is proven by 
construction. 

Dual Problem: In order to avoid dealing with analyticity 
constraints associated with 'H.i, the search will temporarily 
be relaxed to C,D E £2- Later, it will be shown that there 
are C,D E H2 that satisfy the derived optimality criteria. 
For A > and matrix- valued $ E Coo, introduce the 
Lagrangian 



L{C,D,X,<i>) = \\Df^ + X{\\CHf2-<7 



(Re $, Re DC - K) - (Im $, Im DC - K) 



^\\D 



\1 + \[\\CH\\1-ct') - Re($, L>C - /^) 



l^ll 



du 



X \\CH\\p - Re tr {1>*{DC - K))— - Xa 



2tt 



(31) 



The integrand in (|3ip can be rewritten as 



\D\ 



p + X \\CH\\'p - Retr (C$*L» - '!>*K) 

D- -<^C* 
2 



D- -$C" 
2 



^ \\CHfp - i ||C$*||^ + Retr ($*i^) 



-tr 



C ( XHH* - i$*$ ) C* + Re $*if 



(32) 



Only the first term depends on D. The contribution of 
this term is minimized by 



D = i$C*. 
2 



(33) 



If (P5)) holds, then L only depends on C through the first 
term inside the brackets in p2p . Pointwise minimization 
of that term gives 



C XHH* 



1 



-$*$ C 



0, AXHH* > $*$ on T 

— cx), otherwise. 



inf tr 
CGC2 



Moreover, the remaining term in (|32p can be written 

tr {<i>*K) = tr {<^*DC) = ^tr {C<^*(^C*) = ^ ||$C*||^ . 

Thus, tr {^*K) is real and non-negative, and 

ir tr($*X)|^ - Acr2 4Ai?i7*>$*$ on T 
c\DeC2 —00, otherwise. 



Introduce 



* 



2VA 



<PH- 



Then the dual problem can be written as 

tr (^*KH) — - Xa^ 

subject to 

*** < / on T. 



(34) 



The dual function is concave in A. Letting A = gives 
the value 0. Since tr {'li*KH) > there exists A > that 
gives a positive value, so the optimal A is given by the 
first-order condition 



1 r 

-J / tr {^*KH) 



did 

2^ 



A. 



obtained by differentiation with respect to A. With this A 
the dual problem simplifies to 



maximize^ / tr (^ KH) — 



(35) 



subject to ((34)) . 

The integrand in (j35l) will now be maximized pointwise. 
Recall that KH = KiK^H = X,,C/oSy* and denote the 
number of rows of Kq by m. Then S is diagonal with 



diagonal elements a^, k = 1 . . .m. Since K \s Ue y. Ug the 
rank of K is not greater than min{ne,ns} and thus 



TO < min{ne, ?t./}. 



(36) 



Ko is row outer by definition and H is outer by Corollary 
4.7 in Plj- It follows that KqH is row outer and thus 
has full row rank. It follows that the singular values are 
positive: (Tfe > 0, fc = 1 . . . tti. Since KqH is wide (it has 
Us > m columns) it follows that Uo is square and thus 
unitary. 

Define U = K,Uo and * = C/**F. Then it follows from 
([Ml and UU* < I that 

$*$ = \/*vI/*^f7*vl/y < \/*vl/*vl/l/ < V*V = /. 

Using ^, an upper bound can be obtained for the integrand 
in dMD: 

sup tr {^*KH) = sup tr {^*UT.V*) 
= sup tr (F***C/I]) 

< _sup tr {^*Y^ 

= ^ sup cTfc^fefe ^ y^o-fc 

fc=ll*fcfcl<l fc=l 

The supremum is achieved if and only if ^ = /. Therefore, 
the upper bound is achieved by ^ if and only if U*'^V = I 
and \I'*\I/ < /. The set of ^I^ satisfying these conditions can 
be parametrized as: 



*o (37) 

(38) 

(39) 

Pre-multiplying (j39p with Uo gives the equivalent condi- 
tion 



* = UV* + *o = K.UoV* + 
/> ***, 

where ^o satisfies 

= U*-^nV = U*K*^aV. 



K*^oV = 0. 



(40) 



Choosing, for example, ^o = gives vj/ — UV*, which 
attains the upper bound. Hence, the value of the dual 
problem is 

max ^( r tr (^*KH) —] 

= ^ (£ t^- (^t/^C/SF*) ^)' = ^ IlifiJIl? . 

The maximizing dual variables are given by 

$ = 2\/A*iJ* == 2VA(ii:,C/oV^* + *o)^* (41) 

where ^o is such that ^7^, ^^ and (gO]) hold, and 

1 



A 



^^^^m. 



(42) 



Saddle Point: It will now be shown that there is a saddle 
point, which implies that the duality gap is zero. 



In the following, assume that dSZ]), (|3H1), (HOD- dSD and 
dm hold. Then A and $ are dual feasible. The point 
(C, D, A, $) is a saddle point if and only if C, Z? € 'H2 are 
primal feasible. 



A(||Ci/||2-cr2 







(43) 



and 



L(C, £>, A, $) = inf L{C,D,\,^). (44) 

c, De-Hi 

The saddle point conditions imply that ||C-ff||, = a since 
A > and that D — i$C* as it was seen earlier that this 
follows from minimization of the Lagrangian. 

Suppose that the saddle point conditions hold. Then 
C, D satisfy K = DC and D = i$C*. Moreover, 

DD* = ^DC<^* = ^K<i>* = y/\K,KoH{VU;K* + %) 
= V\{K,Uo^U:K* + K,Uo^V*%). 

Clearly, DD* and KiUo^U*K* are Hermitian. Accord- 
ingly, 

A = K,UoT.V*-^*Q 



must be Hermitian. Now, by ((40|) . 

AK, = K,Uo^V*%K, - 
^ = AIU = A*K, = '^oVY.U*K*K, = -^qVY^U^. 

Hence, A = and 

DD* = VXK,Uo^U;K* = ^^^^J^ K,Uo'SU*K*. (45) 

Suppose instead that C,D E H2 satisfy K = DC, 
||Ci7||2 — (J and (|45|) . Then C,D are primal feasible and 
(|^5| is satisfied. Moreover, 

L{C,D,X,^) = \\D\\l = V\ r tr {K,UoW:K*)^ 

= VArtr(I])f^.i,||A'F||?, 

so (im holds and thus the saddle point conditions are 
satisfied. Since these assumptions and the saddle point 
conditions imply each other, they are equivalent. 

To conclude, it has been shown that (C, D, A, $) is a 
saddle point, which implies that C,D E H2 achieve the 
claimed minimum, if and only UK — DC, \\CH\\2 — a^ 
and (HSl) holds. 



Existence of Solution: Define B = ^fXUo'ZU* G £1, which 
is Hermitian with real diagonal. Recall that KqH is row 
outer with singular values afc > 0, fc = 1 . . . rn. From this 
and Lemma [5] it follows that logcrfe G £1. Since Uo is 
unitary it also follows that B is positive definite. Moreover, 



log det B == — log A + 2^ log (jk e £1 
fc=i 



Therefore, according to the theorem in ^7\, there is an 
outer transfer matrix Do G 7^2 such that B = DqD*. Let 
D = K,Do e H2 and C = D^^Ko. Then 

C ^ D-^KoHH-^ = D-^Uo^V*H-^ 

= D-^Uo^U;UoV*H-^ = -^DlUoV*H-^ e £2 

vA 

Since Do is outer it follows from Lemma |4]jhat C € ^2 • 
It can now be verified that C and D satisfy the 
optimality conditions: 



DC 



K.DoD-^Ko 



K,Kr 



K, 



CH 



\d::^KoH\\ 



tr {H*K:Do*Do'KoH] 



du 
2^ 



I tr {VT.UoB-^UoT.V*) 

1 r , ,duj , 

-^ tr S) — = a^ 



dui 
2^ 



and 



DD* = K,DoD*oK* = y/XK,UoT.U*oK* 



If the rank of K does not equal rit , then C and D are 
not of the required dimensions. C is tti x n^ and D is 
rig X TO, where, by ([26]) and p6p . to, < min{ne,rij} < n^. 
It is required that C is n^ x n^ and that _D is rig x rif. To 
solve this problem, let 



D 



D 



Jie Xnt—'m 



e%, C 



c 



rit—m XUs 



€■^2 



Noting that DC = DC = K, that ||CiJ|| 



C7J 



and 



that DD* = Z?_D* it is finally concluded that C, D are 
optimal. ■ 

Just as in the scalar case, the solution to the optimal 
factorization problem can be used to find an equivalent 
convex problem. This problem looks exactly the same both 
cases. The theorem for the vector case is now stated. 

Theorem 2: Suppose that cr > 0, S*, M, P E T-L^o and 
that (|26p and (|27|) hold. Then the optimization problem 



minimize JiC.D) 

C,DeH2 



\CM\\l < a^ 



(46) 
(47) 



subject to 

||ra||2-r„--,l2 

attains a minimum value that is equal to the minimum of 
the convex optimization problem 

minimize 11 (P - K)S\\i + \\KM\\l + 4^ \\K \S M] IP , 

K£-H2 0-2 II L J 111 

(48) 

which is attained by a unique minimizer. 

Moreover, suppose K e 7^2 is a solution to (H5|) . If K 
is not identically zero, then C, D E H2 solve (|15|) subject 
to gZl) if and only if 

K = DC, \\C[S M]\\l=a^, 

\k[s M]\\^ 



DD* 



-k^Uo^u:k* 



where Ki is defined by an inner-outer factorization 
K = KiKo and Uo and S are given by a singular value 
decomposition KoH — Uo^V* , where H S "Hoo satisfies 
R-'^ e Uoo and HH* = SS* + MM*. 

If K — 0, then the solution to (|46)) and (|47)) is given by 
D ~Q and any function C G H2 that satisfies (|47|) . 

Proof: With the assumption (|27|) . Lemma [1] holds in 
the matrix case as well. The rest of the proof is identical 
to the proof of Theorem [TJ except that Lemma [3] is used 
instead of Lemma [21 with the obvious implications for the 
optimality conditions. ■ 

Remark 5: The assumption (j26p may deserve some 
explanation. If there are too few communication channels 
relative to the dimensionality of s and e, the maximum 
rank of the product DC may be smaller than the smallest 
dimension of K. Then not all K would be realizable 
as a product of D and C, and a rank condition would 
have to be imposed on K in Theorem [21 In principle, 
this changes nothing, but the assumption is included in 
order to avoid formulating the solution in terms of an 
optimization problems that cannot be reliably solved. 

C. Optimal LTI Filters Require Infinite Memory 

The structure of optimal linear encoders and decoders 
will now be studied. In particular, it will be shown that the 
optimal filters generally have non-rational transfer func- 
tions. This corresponds to systems with infinite memory, 
since it is generally impossible to find a finite dimensional 
state-space realization of such transfer functions. 

We consider the scalar case with white channel noise 
and rational S,M,P and W. This implies that N{z) — 1 
and that R = WPSS*H~* is rational, where H satisfies 
([T3I) . Since S,M,P and W are proper, it can safely be 
assumed that R is proper. If R is not proper then it can 
be made proper by multiplying H with z"*^, for a large 
enough k. 

If we define 

1 



HX) 



\R-X\ 



;^il-^lli> 



the solution is given by solving the problem 

minimize i/jf^). 

xe-H2 



(49) 



(50) 



Recall that the minimum of (|50p is attained and that it 
is a strictly convex problem. It will now be shown that a 
necessary condition for the minimum cannot be satisfied 
by a rational X except in some special cases. To begin 
with, two simple observations are made: 

1) If the solution X to ([50| is a rational function it can 
be factorized into inner-outer factors as X = FXo- 
The outer factor Xo is then a rational function that 
solves the optimization problem 



minimize \\F* R - 



X. 



o\l2 



^W^oWl 



(51) 



where F*R is a rational function. Thus, we can 
assume without loss of generality that the optimal 
solution X is outer. 
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2) Due to the orthogonality 



\R-Xr, = \\R^ 



\R+-X\ 



2 ' 



(52) 



where R = R+ + i?_ is a decomposition of R into 
the analytical and anti-analytical parts, respectively, 
we can also assume that R is analytical since the 
anti-analytical part does not affect the optimization. 
That is, R = R+. 
Another assumption we make to simplify the proof is 
that the function R has only simple poles. Note that the 
poles of P+ {F*R) are the same as of P+R, so the simplicity 
of the poles remains true through the two rewritings above. 
Theorem 3: Consider the problem ([50)1 with a proper 
non-constant rational function R S 7^2 and assume that 
the poles of R are simple and that the optimal solution X 
is not identically zero. Then X is not a rational function. 
Proof: We split the proof into several steps to under- 
line the structure. 

Step 1: Calculate the first variation of the functional 
ip and state the Euler-Lagrange equation. The standard 
differentiation of ^'(-'^ + ^h) with respect to e and then 
setting e = gives 



Sil;{h) =2 Re 



a2 \X\ 



(R-X] 



du) 

2^ 



For convex problems the necessary and sufficient condition 
for the minimum is that 6ip{h) — for all h E H2- It gives 
the Euler-Lagrange equation for the optimal X as 






R~X 



where P+ is the standard orthogonal projection from £2 to 
H2- Note that the constant \\X\\^ u"^ can be incorporated 
into X and R without affecting their rationality. So in the 
following we assume without loss of generality that this 
constant is equal to 1 and analyze the equation 



^+^^^-^- 



(53) 



Since X is not identically zero, it is not zero almost 
everywhere on T and the fraction t^ is well defined. 

Step 2: We will now assume that the solution X E T-L2 
to ((55|) is rational and show that it will lead us to a 
contradiction. In this step we prove that rationality of R 
and X in (|53|l implies rationality of |X|. Indeed 

r ^ X r^ -^ X 1— r . 

X +X X 



\x\ = x* 



The second term in the right hand side is anti-analytical, 
hence 

P+\X\=P+{X*P+^^). (54) 

Clearly P+X\X\^^ is rational due to ((53| and thus the 
right hand side of ((54|) is rational too. Accordingly, P+|X| 
is also rational. Furthermore, the function |X| is real and 
has a symmetric Laurent series. Therefore, the function 
\X\ must be rational itself. 



Factorization as \X\ = h*h = |/ip = |/i^| with an 
outer rational h E "^2 and assuming, as was explained 
previously, that X is outer, gives the only possibility that 
X ~ h^ . That is, the rational solution X must be a square 
of a rational function. 

Step 3: Rewrite the Euler-Lagrange equation in terms of 
h and then in terms of numerators and denominators of 
h and R. Substituting X ~ h^ into ([53|) gives 

X h^ h ,y 

Introduce the notations for the numerators and the de- 
nominators 

q a 

where a, 6, p and q are polynomials. The polynomials a, 
p and q are stable by definition, since R E 14.2 ^'iid h 
is outer in 'H2- Introduce the notation for the conjugate 
polynomial to p as 

p{z) = z>(z-i) 

where n is the degree of p. The conjugate of a stable 
polynomial has the same degree and is anti-stable. With 
these notations in mind the Euler-Lagrange equation 
becomes 

pq^n-m ^ ^ ^2 u^2 „^2 

a 



R 



p" bq — ap 
I2 



qp a q^ aq-^ 

Here n and m are degrees of p and q respectively. 



(55) 



Step 4: Calculate the projection in the left hand side 
of (|55|) and state the polynomial version of the Euler- 
Lagrange equation. We assume now that n — m > and 
cover the opposite in the next step. Perform the partial 
fraction decomposition 



pqz 



Q 



qp q P 

where Q is a polynomial and the degree of r is less than 
n. Then 



P_ 



pqz''-"' Q 



qp 



q 



and the equation (|55l) becomes 

aqQ = bq — ap . 

Clearly q{z) — implies a{z) = since p and q are prime, 
hence a = gag where uq is a polynomial. Canceling q above 
we get 

aoqQ ^bq- aop"^. 

Similarly q{z) = implies ao{z) = and thus oq = qai. 
Canceling again gives 

aiqQ = 6 — aip . 

Now it is clear that ai — 1 since otherwise ai{z) — 
would give b{z) = 0, which is impossible since a and b are 
also prime. Finally, we have a = q^ , which contradicts the 
assumption that zeros of a are simple unless q — a ^ 1. 
But for a proper non-constant R it is impossible. 



Step 5: The case n — ?Ti < is similar. Denote k — m — n. 
The only difference is in the partial fraction decomposition 

qpz^ q pz^ 

where Q is a polynomial and the degree of r is less than 
n + k = m. The rest is exactly the same as in Step 4 with 
the same conclusion that a = q^ which contradicts the 
assumption. ■ 

Because S, M and W are assumed to be rational and 
X = WKH it follows that K is rational if and only if 
X is rational. Clearly, if K is not rational, it cannot be 
factorized as K = DC with rational C and D. Thus, the 
transfer functions of optimal LTI encoders and decoders 
are not rational. 

As explained previously, this means that the filters can 
not be realized using finite memory. Obviously, approxi- 
mations have to be done for practical implementation. For 
example, impulse responses of the filters may be truncated. 
It remains to investigate the impact on the performance 
of such approximations. 

If the channel has noise-free feedback, that is, if C has 
access to the channel output, then C can estimate the 
states of D exactly. It would be interesting to study if the 
memory of optimal linear encoders and decoders could 
be bounded in this case. Such a result would also be in 
line with the structural result for causal coders in |11| . 
where the memory was bounded given that the encoder 
has knowledge of the decoder state. 

V. Numerical Solution 

A procedure for obtaining an approximate numerical 
solution will now be outlined for the vector version of the 
problem. 

1) The first step is to solve the optimization problem 
(|48|) or, alternatively, minimize ([25| (the constant 
part T] must then be added to obtain the distortion). 
An approximate solution can be obtained by using a 
finite basis representation of K and approximating 
the integrals by sums over a finite number of fre- 
quency grid points. Such an approximated problem 
can be cast as a quadratic program with second- 
order cone constraints. 

2) Perform a matrix spectral factorization of SS* + 
MM* to obtain H eUoo with iJ^^ e "Hoc- 

3) Perform an inner-outer factorization of K to obtain 
K,Ko = K. 

4) Perform a singular value decomposition of KqH to 
obtain f/oSF* = KoH. 

5) Use a finite basis approximation A{ijj) of DD* , for 
example using the parametrization 



JVe 



AH=Ao + ;^Afc(e'=^"+e- 



kiu\ 



fc=l 



and fit A{ijj) to 

\\k[s M]\\^ 



by minimizing the deviation in some suitable norm. 

6) Perform a spectral factorization of A{uj), choosing 
Do as the stable and outer spectral factor. 

7) Let D = K,Do and C ^ D-^Ko- 

8) If C and D are of incorrect size, add rows of zeros 
to C and columns of zeros to D until they are of 
correct size. 

In the scalar case, the procedure is simplified as follows: 
Step 2 and 6 requires only scalar spectral factorizations, 
step 3, 4 and 8 are skipped and step 5 consists of fitting 
A{uj) to 

\\K\S Mill 



A. Example 

The numerical solution is illustrated by the following 
example. Consider the problem with S = 1/(2: — 0.9), M = 



0, W 



N 



1 and P 



The functional ipiX), 



k,Uo^u:k*, 



given by (j25p . was approximated by discretization of the 
integrals over 4000 grid points, uniformly placed on the 
unit circle. X was parametrized as an FIR filter with 
60 coefficients. The minimization was then carried out for 
different SNR levels a^ and delays d, using Matlab, Yalmip 
[28J and SeDuMi ^. 

The resulting MSE distortion levels are displayed in 
Fig. [3] together with the OPTA for the case with no delay 
constraint, obtained from ^. It can be seen that for small 
SNR's, the distortion is very close to the lower bound. 
This is not surprising since for zero SNR, the minimum 
distortion is ||M^PS'||2 — \\S\\2 over any type of coding 
system. For medium SNR's, the distortion is lower for 
longer delays. The difference seems, however, to decrease 
when the SNR becomes larger. The gap to the OPTA 
seems to approach about a factor two for high SNR's, 
regardless of delay. This suggests that for this source, it 
is the linearity, rather than the delay constraint, that is 
the performance-limiting factor for high SNR levels. 

VI. Conclusion 

This paper has shown how to find optimal LTI encoders 
and decoders for joint source-channel coding for Gaussian 
sources and channels. It has also been shown that such 
encoders and decoders in general require infinite memory. 
Thus, some approximation has to be done for numerical 
solution of the problem. It would be interesting to inves- 
tigate if the performance loss due to such approximations 
can be somehow bounded. 

In the scalar case, the solution has been extended to 
handle channels with feedback pj. This is not presented 
here to conserve space. Another extension is the problem 
of feedback control over AWGN channels, which will be 
the topic of an upcoming paper. 

Possible topics for further research includes extending 
the solution in the MIMO case to channels with colored 
noise, investigating memory bounds when the channel has 
feedback and the suboptimality of linear solutions. 
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integral of every term also must be bounded from below. 
That is, 



10 
SNRo^ 



Fig. 3. MSB distortion as a function of SNR level (logarithmic 
scale) for optimal linear encoders and decoders for three different 
delay constraints (approximate solutions), and the OPTA for the 
case without delay constraint. (Problem parameters: S = l/(z — 0.9), 
M = 0,W = N = 1,P = z-"^) 



Appendix 

Lemma 4: Suppose Y G J\f^ is square and outer, X G 
A/'+, and that Y-^X e Cp. Then Y-^X e Up. 

Proof: y^i e A/'+ by Theorem 10 in [26\. It is easy to 
verify that the product of two Af^ functions is A/"^ . The 
proof follows from the fact that Cp DAf^ = Hp [H]. ■ 

Lemma 5: Suppose that m < n and that the m x n 
transfer matrix X g Up, p G {1, 2, oo}, is row outer. Then 
the singular values of X satisfy 



logo-fe e £i, fc = 1 



. . TO. 



Proof: By Theorem 8 in [26] there exists a factoriza- 
tion X = XcoXi, where Xco is column outer and Xi is 
inner. Since Xco has full column rank on T it cannot have 
more columns than rows, and since X is row outer Xco 
cannot have fewer rows than columns. Thus Xco is to x to 
and hence, by Theorem 10 in [3B], det Aco is outer and 
thus det Xco G A/'+ . According to a statement in section 
17.19 in [2S] it follows that log |det Aco| e Ci. 
For the singular values of A, it holds that 



fe=i 



log ak = ^ log n ^fe = J log det AA' 



fc=i 

= i log det AeoA, A* a:,, = i log det A,„ A,*„ 
=:log|detAeo|e/:i. 

Furthermore, ak G £i since A e Up. Because logat < crk 
it holds that 



logcTfe duj < 



(7k diO < oo, fc = 1 . . . TO 



log ak duo > — oo, fc = 1 . . . 771 
and hence logCTfc e £i, fc = 1 . . . m ■ 
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Since the sum of the logarithms is Ci and every term 
has an integral bounded from above, it follows that the 
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